Overview: Script for water temperature data from water stations. Sources and information on data below. First I remove data points that are flagged as failed and suspect. Then I use a rolling mean to remove data points +/- 2 standard deviations from the 2hr rolling mean. Finally I calculate hourly mean since data was collected at different frequencies. This will set me up to do analysis between sites and with field data. Sites graphed from cloest to the delta to the mouth of the bay.

Notes on data:

-Accessed and downloaded all data on September 30, 2020 -Downloaded all data available from 1/1/2017-12/31/2019. -China camp http://cdmo.baruch.sc.edu/aqs/zips.cfm Data logged every 15 minutes. Requested citation format: NOAA National Estuarine Research Reserve System (NERRS). System-wide Monitoring Program. Data accessed from the NOAA NERRS Centralized Data Management Office website: http://cdmo.baruch.sc.edu/; accessed 30 September 2020. -Richardson Bay/NERR data. Data logged every 15minutes. -EOS https://oceanview.pfeg.noaa.gov/erddap/tabledap/rtcctdRTCysi.html. Data logged every 6 minutes. time in UTC. Should I use this link? http://erddap.cencoos.org/erddap/tabledap/tiburon-water-tibc1.html. ph and salinity available. No air temperature. -Point Fort http://erddap.cencoos.org/erddap/tabledap/fort-point.html?time%2Csea_water_temperature%2Csea_water_temperature_qc_agg%2Cz&time%3E%3D2020-09-20T21%3A07%3A41Z&time%3C%3D2020-09-29T12%3A00%3A00Z. time in UTC. Data logged every 6 minutes.

Next steps: -fix the aesthetics of the graphs. Font too small, change gradient, clean up keys, ect -analysis with field data -site characterization summary table

Set up

rm(list=ls())

library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes) 
library(cranlogs)
library(knitr)
library(openair)

Make a custom function to calculate mean, standard deviation (sd), 95% confidence interval where x is a numeric vector. Here “hi” and “lo” are defined as 2 strandard deviations away from mean. This creates a functin so you don’t need to change anything here/this step does not manipulate the data. You can change the equation to define the hi and low as 3 or however many sds away you would like. na.rm=TRUE removes any “NA” values. “ret” are the new columns it’ll add when you run this on a df.

custom_stat_fun_2<-function(x, na.rm=TRUE){
  m<-mean(x, na.rm=TRUE)
  s<- sd(x, na.rm=TRUE)
  hi<-m+2*s
  lo<-m-2*s
  ret<-c(mean=m, stdev=s, hi.95=hi, lo.95=lo)
}

China Camp
Read in data. Subset (not needed but it’s a large df). Needs a column named “date” with no time stamp for step later.

cc<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/china_camp_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))

cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))

cc<-subset(cc, select=c(date, datetime,Temp, F_Temp))

cc%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: All data points",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).

Remove data that failed QC test

cc<-cc[- grep("-3", cc$F_Temp),]

cc%>%
  ggplot(aes(x=date, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: Failed QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Remove data points flagged as “suspect” too

cc<-cc[- grep("1", cc$F_Temp),]

cc%>%
  ggplot(aes(x=date, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: Failed & suspect QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Roll apply using custom stat function. Needs a column named “date” with no time stamp in order to work. Since data was collected every 15min, 8 observations=2hour window

cc<-cc%>%
  tq_mutate(
    select= Temp,
    mutate_fun= rollapply,
    width= 8,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter to remove calues that are +/- 2 standard deviations from the rolling mean

cc<-filter(cc, Temp <hi.95 & Temp >lo.95)

cc%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: Failed QC points removed + +/- 2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 12 rows containing missing values (geom_point).

Hourly median
For timeAverage function to work, you need a date or POSIXct column named “date” but it needs the timestamp to work so we’ll add it back in. Interval = frequency of data collection.

cc<-subset(cc, select=-c(date))
cc$date<-as.POSIXct(cc$datetime, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med<- timeAverage(cc,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 12 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.1hr.med<-subset(cc.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))

cc_temp<-cc.1hr.med %>% ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+ylim(5,35)+
  labs(title="Hourly median water temperature from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

cc_temp
## Warning: Removed 921 rows containing missing values (geom_point).

Format and save

cc.1hr.med$date<-as.Date(cc.1hr.med$datetime, format = c("%Y-%m-%d"))

names(cc.1hr.med)[names(cc.1hr.med) == "Temp"] <- "water_temp"

write.csv(cc.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/cc_watertemp.csv")

EOS pier

Set up

eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))
eos$datetime<-as.POSIXct(eos$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos<-subset(eos, select=c(date, datetime, temperature))

eos%>%
  ggplot(aes(x=datetime, y=temperature, color=temperature))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Looks good except for that set of high values at the end of 2018. This corresponds with the suspicious data in salinity. Likely bad data that needs to be removed.

Since this data set doesn’t have a QC test column, I’ll remove values that are far outside expected salinity values in Tiburon.

eos<-filter(eos, temperature >0 & temperature <22)

eos%>%
  ggplot(aes(x=datetime, y=temperature, color=temperature))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from EOS resesarch pier: Values outside of expected range removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Roll apply using custom stat function. Data collected every 6 minutes so 20 observations=2 hours. Needs a column named “date” with no time stamp in order to work.

2hr

eos<-eos%>%
  tq_mutate(
    select= temperature,
    mutate_fun= rollapply,
    width= 20,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter and graph

eos<-filter(eos, temperature <hi.95 & temperature >lo.95)


ggplot(data = eos, mapping = aes(x = datetime, y = temperature, color=temperature)) + geom_point()+
  labs(title="Water temperature data from EOS resesarch pier: Extreme values & +/- 2sd data removed ",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Looks pretty good

Hourly median

eos<-subset(eos, select=-c(date))
names(eos)[names(eos) == "datetime"] <- "date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos.1hr.med<- timeAverage(eos,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.1hr.med<-subset(eos.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))

eos_temp<-eos.1hr.med %>% ggplot(aes(x=date, y=temperature, color=temperature))+
  geom_point(alpha=0.5)+ylim(5,35)+
  labs(title="Hourly median ph from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
eos_temp
## Warning: Removed 513 rows containing missing values (geom_point).

Looks good. Not sure about those few points at the end of 2018. Again, probably bad data that I need to remove earlier on.

Format and save

names(eos.1hr.med)[names(eos.1hr.med) == "date"] <- "datetime"
eos.1hr.med$date<-as.Date(eos.1hr.med$datetime, format = c("%Y-%m-%d"))
names(eos)[names(eos) == "temperature"] <- "water_temp"

write.csv(eos.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_watertemp.csv")

Richardson Bay

Set up

rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))

rb<-subset(rb, select=c(date, datetime, Temp, F_Temp, DateTimeStamp))

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Richardson Bay: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove flagged data that did not pass QAQC

rb<-rb[!grepl("-3",rb$F_Temp),]

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Richardson Bay: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove data points flagged as “suspect”

rb<-rb[!grepl("1",rb$F_Temp),]

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature from Richardson Bay: Failed & suspect QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Roll apply using custom stat function. Data collected every 15 min so 2hrs=8 observations

rb<-rb%>%
  tq_mutate(
    select= Temp,
    mutate_fun= rollapply,
    width= 8,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

rb<-filter(rb, Temp <hi.95 & Temp >lo.95)

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Richardson Bay: Failed & suspect QC data + -/+2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 5 rows containing missing values (geom_point).

Hourly median

rb<-subset(rb, select=-c(date))
rb$date<-as.POSIXct(rb$datetime, format = c("%m/%d/%y %H:%M"))


rb.1hr.med<- timeAverage(rb,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 5 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb_temp<-rb%>%
  ggplot(aes(x=date, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+ylim(5,35)+
  labs(title="Hourly median water temperature data from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
rb_temp
## Warning: Removed 5 rows containing missing values (geom_point).

Format and save

rb.1hr.med<-subset(rb.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))
names(rb.1hr.med)[names(rb.1hr.med) == "Temp"] <- "water_temp"
rb.1hr.med$date<-as.Date(rb.1hr.med$date, format = c("%Y-%m-%d"))
write.csv(rb.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/rb_watertemp.csv")

Fort Point

Set up

fp<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fort_point_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
 
fp$date<-as.Date(fp$date, format = c("%Y-%m-%d"))
names(fp)[names(fp) == "sea_water_temperature"] <- "temp"
names(fp)[names(fp) == "sea_water_temperature_qc_agg"] <- "F_temp"

fp<-subset(fp, select=c(datetime, date, temp, F_temp))

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: All Data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with a better view

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+ylim(0,20)+
  labs(title="Water temperature data from Fort Point: All Data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 4 rows containing missing values (geom_point).

Quite a few points around 0 that don’t look right.

Remove flagged data, 4 indicates fail.

fp<-filter(fp, F_temp!=4)

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Remove suspect data

fp<-filter(fp, F_temp!=3)

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: Failed & suspect QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Still has those points around 0 which just do not look right to me. Hopefully they’re removed with the next few steps.

Roll apply using custom stat function. Data collected every 6 minutes so 2hrs = 20 observations

fp<-fp%>%
  tq_mutate(
    select= temp,
    mutate_fun= rollapply,
    width= 20,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
fp<-filter(fp, temp <hi.95 & temp >lo.95)

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Hourly median

fp<-subset(fp, select=-c(date))
fp$date<-as.POSIXct(fp$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))

fp.1hr.med<- timeAverage(fp,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 29 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp_temp<-ggplot(data = fp.1hr.med, mapping = aes(x = date, y = temp, color=temp)) + geom_point()+ylim(5,35)+
    labs(title="Hourly median water temperature data from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
fp_temp
## Warning: Removed 3329 rows containing missing values (geom_point).

Most of the zeros were removed. Don’t like those two points that stick out early on. Format and save

fp.1hr.med<-subset(fp.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))
names(fp.1hr.med)[names(fp.1hr.med) == "date"] <- "datetime"
fp.1hr.med$date<-as.Date(fp.1hr.med$datetime, format = c("%Y-%m-%d"))
names(fp.1hr.med)[names(fp.1hr.med) == "temp"] <- "water_temp"
write.csv(fp.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fp_watertemp.csv")

Graph together

all_temp<-ggarrange(cc_temp, eos_temp, rb_temp, fp_temp, nrow=4, ncol=1)
## Warning: Removed 921 rows containing missing values (geom_point).
## Warning: Removed 513 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 3329 rows containing missing values (geom_point).
all_temp